High-Performance Multi-Pass Unication Parsing
نویسنده
چکیده
Parsing natural language is an attempt to discover some structure in a text (or textual representation) generated by a person. This structure can be put to a variety of uses, including machine translation, grammar conformance checking, and determination of prosody in text-to-speech tasks. Recent theories of Syntax use Unication to better describe the intricacies of natural language [137]. For parsing systems, unication techniques have been either added to a context-free base system [152, 40, 4, 23], or replaced the context-free base entirely [118, 135, 45] (possibly putting it back later [136]). The seemingly small step of adding unication has opened a Pandora’s Box of computational complexity, increasing the difculty of the problem from polynomial [48] to somewhere between NP-complete and intractable, depending on the details of the unication system and how it was added [10]. Worse, unication on a context-free base parser can break the packing technique used to address the problem of ambiguity, leading to exponential blow-ups of the parser’s performance in both space and time in practice. I propose the use of a multi-pass strategy to avoid these problems in practice. I describe a parser which combines the use of shallow, simple value unication with some approximation techniques in order to nd a covering packed parse-forest. This parseforest is then searched for a single-best fully-unifying value; the scoring system which drives the heuristic search encodes linguistically-based disambiguation preferences. The resulting two-pass parser is compared to an ordinary single-pass parser in the context of a heavy-weight knowledge-based machine translation system. The two-pass parser is shown to be competitive with the single-pass parser on average data, both in terms of time and space. It is also shown to be able to avoid a common class of ambiguity blow-up that the single-pass parser is subject to. These results indicate that the multi-pass technique, interleaving some of the unication equations in the parse, is the superior approach for heavy-weight unication parsing.
منابع مشابه
بررسی مقایسهای تأثیر برچسبزنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی
In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...
متن کاملImproving Multi-pass Transition-Based Dependency Parsing Using Enhanced Shift Actions
In multi-pass transition-based dependency parsing algorithm, the shift actions are usually inconsistent for the same node pair in different passes. Some node pairs have a indeed dependency relation, but the modifier node has not been a complete subtree yet. The bottom-up parsing strategy requires to perform shift action for these node pairs. In this paper, we propose a method to improve perform...
متن کاملComputing Phrasal-signs in HPSG prior to Parsing
This paper describes techniques to compile lexical entries in HPSG (Pollard and Sag, 1987; Pollard and Sag, 1993)-style grammar into a set of nite state au-tomata. The states in automata are possible signs derived from lexical entries and contain information raised from the lexical entries. The automata are augmented with feature structures used by a partial unication routine and de-layed/froze...
متن کاملEfficient Multi-Pass Decoding for Synchronous Context Free Grammars
We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model. The trigram pass closes most of the performance gap between a bigram decoder and a much...
متن کاملEfficient Multi-pass Decoding for Synchronous Context Free Grammars
We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model. The trigram pass closes most of the performance gap between a bigram decoder and a much...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002